31 research outputs found
Universal Adaptive Data Augmentation
Existing automatic data augmentation (DA) methods either ignore updating DA's
parameters according to the target model's state during training or adopt
update strategies that are not effective enough. In this work, we design a
novel data augmentation strategy called "Universal Adaptive Data Augmentation"
(UADA). Different from existing methods, UADA would adaptively update DA's
parameters according to the target model's gradient information during
training: given a pre-defined set of DA operations, we randomly decide types
and magnitudes of DA operations for every data batch during training, and
adaptively update DA's parameters along the gradient direction of the loss
concerning DA's parameters. In this way, UADA can increase the training loss of
the target networks, and the target networks would learn features from harder
samples to improve the generalization. Moreover, UADA is very general and can
be utilized in numerous tasks, e.g., image classification, semantic
segmentation and object detection. Extensive experiments with various models
are conducted on CIFAR-10, CIFAR-100, ImageNet, tiny-ImageNet, Cityscapes, and
VOC07+12 to prove the significant performance improvements brought by our
proposed adaptive augmentation.Comment: under submissio
Influencer Backdoor Attack on Semantic Segmentation
When a small number of poisoned samples are injected into the training
dataset of a deep neural network, the network can be induced to exhibit
malicious behavior during inferences, which poses potential threats to
real-world applications. While they have been intensively studied in
classification, backdoor attacks on semantic segmentation have been largely
overlooked. Unlike classification, semantic segmentation aims to classify every
pixel within a given image. In this work, we explore backdoor attacks on
segmentation models to misclassify all pixels of a victim class by injecting a
specific trigger on non-victim pixels during inferences, which is dubbed
Influencer Backdoor Attack (IBA). IBA is expected to maintain the
classification accuracy of non-victim pixels and misleads classifications of
all victim pixels in every single inference. Specifically, we consider two
types of IBA scenarios, i.e., 1) Free-position IBA: the trigger can be
positioned freely except for pixels of the victim class, and 2) Long-distance
IBA: the trigger can only be positioned somewhere far from victim pixels, given
the possible practical constraint. Based on the context aggregation ability of
segmentation models, we propose techniques to improve IBA for the scenarios.
Concretely, for free-position IBA, we propose a simple, yet effective Nearest
Neighbor trigger injection strategy for poisoned sample creation. For
long-distance IBA, we propose a novel Pixel Random Labeling strategy. Our
extensive experiments reveal that current segmentation models do suffer from
backdoor attacks, and verify that our proposed techniques can further increase
attack performance
General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments
Deep Neural Networks (DNNs) are vulnerable to the black-box adversarial
attack that is highly transferable. This threat comes from the distribution gap
between adversarial and clean samples in feature space of the target DNNs. In
this paper, we use Deep Generative Networks (DGNs) with a novel training
mechanism to eliminate the distribution gap. The trained DGNs align the
distribution of adversarial samples with clean ones for the target DNNs by
translating pixel values. Different from previous work, we propose a more
effective pixel level training constraint to make this achievable, thus
enhancing robustness on adversarial samples. Further, a class-aware
feature-level constraint is formulated for integrated distribution alignment.
Our approach is general and applicable to multiple tasks, including image
classification, semantic segmentation, and object detection. We conduct
extensive experiments on different datasets. Our strategy demonstrates its
unique effectiveness and generality against black-box attacks
InsightMapper: A Closer Look at Inner-instance Information for Vectorized High-Definition Mapping
Vectorized high-definition (HD) maps contain detailed information about
surrounding road elements, which are crucial for various downstream tasks in
modern autonomous driving vehicles, such as vehicle planning and control.
Recent works have attempted to directly detect the vectorized HD map as a point
set prediction task, resulting in significant improvements in detection
performance. However, these approaches fail to analyze and exploit the
inner-instance correlations between predicted points, impeding further
advancements. To address these challenges, we investigate the utilization of
inner-tance information for vectorized h-definition
mapping through ransformers and introduce InsightMapper. This paper
presents three novel designs within InsightMapper that leverage inner-instance
information in distinct ways, including hybrid query generation, inner-instance
query fusion, and inner-instance feature aggregation. Comparative experiments
are conducted on the NuScenes dataset, showcasing the superiority of our
proposed method. InsightMapper surpasses previous state-of-the-art (SOTA)
methods by 5.78 mAP and 5.12 TOPO, which assess topology correctness.
Simultaneously, InsightMapper maintains high efficiency during both training
and inference phases, resulting in remarkable comprehensive performance. The
project page for this work is available at
https://tonyxuqaq.github.io/projects/InsightMapper .Comment: Code and demo will be available at
https://tonyxuqaq.github.io/projects/InsightMappe
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Remote photoplethysmography (rPPG), which aims at measuring heart activities
and physiological signals from facial video without any contact, has great
potential in many applications (e.g., remote healthcare and affective
computing). Recent deep learning approaches focus on mining subtle rPPG clues
using convolutional neural networks with limited spatio-temporal receptive
fields, which neglect the long-range spatio-temporal perception and interaction
for rPPG modeling. In this paper, we propose the PhysFormer, an end-to-end
video transformer based architecture, to adaptively aggregate both local and
global spatio-temporal features for rPPG representation enhancement. As key
modules in PhysFormer, the temporal difference transformers first enhance the
quasi-periodic rPPG features with temporal difference guided global attention,
and then refine the local spatio-temporal representation against interference.
Furthermore, we also propose the label distribution learning and a curriculum
learning inspired dynamic constraint in frequency domain, which provide
elaborate supervisions for PhysFormer and alleviate overfitting. Comprehensive
experiments are performed on four benchmark datasets to show our superior
performance on both intra- and cross-dataset testings. One highlight is that,
unlike most transformer networks needed pretraining from large-scale datasets,
the proposed PhysFormer can be easily trained from scratch on rPPG datasets,
which makes it promising as a novel transformer baseline for the rPPG
community. The codes will be released at
https://github.com/ZitongYu/PhysFormer.Comment: Accepted by CVPR202
Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning
Semi-supervised learning is attracting blooming attention, due to its success
in combining unlabeled data. To mitigate potentially incorrect pseudo labels,
recent frameworks mostly set a fixed confidence threshold to discard uncertain
samples. This practice ensures high-quality pseudo labels, but incurs a
relatively low utilization of the whole unlabeled set. In this work, our key
insight is that these uncertain samples can be turned into certain ones, as
long as the confusion classes for the top-1 class are detected and removed.
Invoked by this, we propose a novel method dubbed ShrinkMatch to learn
uncertain samples. For each uncertain sample, it adaptively seeks a shrunk
class space, which merely contains the original top-1 class, as well as
remaining less likely classes. Since the confusion ones are removed in this
space, the re-calculated top-1 confidence can satisfy the pre-defined
threshold. We then impose a consistency regularization between a pair of
strongly and weakly augmented samples in the shrunk space to strive for
discriminative representations. Furthermore, considering the varied reliability
among uncertain samples and the gradually improved model during training, we
correspondingly design two reweighting principles for our uncertain loss. Our
method exhibits impressive performance on widely adopted benchmarks. Code is
available at https://github.com/LiheYoung/ShrinkMatch.Comment: Accepted by ICCV 202
SAM3D: Segment Anything in 3D Scenes
In this work, we propose SAM3D, a novel framework that is able to predict
masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB
images without further training or finetuning. For a point cloud of a 3D scene
with posed RGB images, we first predict segmentation masks of RGB images with
SAM, and then project the 2D masks into the 3D points. Later, we merge the 3D
masks iteratively with a bottom-up merging approach. At each step, we merge the
point cloud masks of two adjacent frames with the bidirectional merging
approach. In this way, the 3D masks predicted from different frames are
gradually merged into the 3D masks of the whole 3D scene. Finally, we can
optionally ensemble the result from our SAM3D with the over-segmentation
results based on the geometric information of the 3D scenes. Our approach is
experimented with ScanNet dataset and qualitative results demonstrate that our
SAM3D achieves reasonable and fine-grained 3D segmentation results without any
training or finetuning of SAM.Comment: Technical Report. The code is released at
https://github.com/Pointcept/SegmentAnything3